Time Series Data Validation Demo

Introduction

The Time Series Data Validation Demo notebook aims to demonstrate the application of various data validation tests using the ValidMind MRM Platform and Developer Framework. As model developers working in the financial sector, ensuring the quality and reliability of time series data is essential for accurate model predictions and robust decision-making processes.

In this demo, we will walk through different data validation suites of tests tailored for time series data, showcasing how these tools can assist you in identifying potential issues and inconsistencies in the data. By utilizing the ValidMind MRM platform and developer framework, you can streamline your data validation process, allowing you to focus on building and refining your models with confidence.

Let’s get started!

Setup

Prepare the environment for our analysis. First, import all necessary libraries and modules required for our analysis. Next, define and configure the specific use case we are working on by setting up any required parameters, data sources, or other settings that will be used throughout the analysis. Finally, establish a connection to the ValidMind MRM platform, which provides a comprehensive suite of tools and services for model validation.

Import Libraries

# Load API key and secret from environment variables
%load_ext dotenv
%dotenv .env

# System libraries
import glob
import os
import pickle

# ML libraries
import pandas as pd

# ValidMind libraries 
import validmind as vm

Use Case Configuration

from validmind.datasets.regression import fred
iris_df = fred.load_data()
dataset = 'fred'

if dataset == 'lending_club':
    target_column = ['loan_rate_A']
    feature_columns = ['loan_rate_B', 'loan_rate_C', 'loan_rate_D']
    from validmind.datasets.regression import lending_club
    raw_df = lending_club.load_data()
if dataset == 'fred':
    target_column = ['MORTGAGE30US']
    feature_columns = ['FEDFUNDS', 'GS10', 'UNRATE']
    from validmind.datasets.regression import fred
    raw_df = fred.load_data()
    selected_cols = target_column + feature_columns
    raw_df = raw_df[selected_cols]

Connect to ValidMind MRM Platform

vm.init(
  api_host = "http://localhost:3000/api/v1/tracking",
  project = "clhhz04x40000wcy6shay2oco"
)
Connected to ValidMind. Project: Customer Churn Model - Initial Validation (clhhz04x40000wcy6shay2oco)

Data Description

display(raw_df)
MORTGAGE30US FEDFUNDS GS10 UNRATE
DATE
1947-01-01 NaN NaN NaN NaN
1947-02-01 NaN NaN NaN NaN
1947-03-01 NaN NaN NaN NaN
1947-04-01 NaN NaN NaN NaN
1947-05-01 NaN NaN NaN NaN
... ... ... ... ...
2023-04-01 NaN NaN 3.46 NaN
2023-04-06 6.28 NaN NaN NaN
2023-04-13 6.27 NaN NaN NaN
2023-04-20 6.39 NaN NaN NaN
2023-04-27 6.43 NaN NaN NaN

3551 rows × 4 columns

raw_df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3551 entries, 1947-01-01 to 2023-04-27
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   MORTGAGE30US  2718 non-null   float64
 1   FEDFUNDS      825 non-null    float64
 2   GS10          841 non-null    float64
 3   UNRATE        903 non-null    float64
dtypes: float64(4)
memory usage: 138.7 KB

Data Preparation

List of Available Test Plans

The vm.test_plans.list_plans() function is a part of the ValidMind (vm) library that provides a comprehensive list of available test plans. These test plans are pre-built sets of tests designed to perform automated data and model validation, such as data quality, exploratory data analysis, and model performance.

vm.test_plans.list_plans()
ID Name Description
sklearn_classifier_metrics SKLearnClassifierMetrics Test plan for sklearn classifier metrics
sklearn_classifier_validation SKLearnClassifierPerformanceTest plan for sklearn classifier models
sklearn_classifier_model_diagnosisSKLearnClassifierDiagnosis Test plan for sklearn classifier model diagnosis tests
sklearn_classifier SKLearnClassifier Test plan for sklearn classifier models that includes both metrics and validation tests
tabular_dataset TabularDataset Test plan for generic tabular datasets
tabular_dataset_description TabularDatasetDescription Test plan to extract metadata and descriptive statistics from a tabular dataset
tabular_data_quality TabularDataQuality Test plan for data quality on tabular datasets
normality_test_plan NormalityTestPlan Test plan to perform normality tests.
autocorrelation_test_plan AutocorrelationTestPlan Test plan to perform autocorrelation tests.
seasonality_test_plan SesonalityTestPlan Test plan to perform seasonality tests.
unit_root UnitRoot Test plan to perform unit root tests.
stationarity_test_plan StationarityTestPlan Test plan to perform stationarity tests.
timeseries TimeSeries Test plan for time series statsmodels that includes both metrics and validation tests
time_series_data_quality TimeSeriesDataQuality Test plan for data quality on time series datasets
time_series_dataset TimeSeriesDataset Test plan for time series datasets
time_series_univariate TimeSeriesUnivariate Test plan to perform time series univariate analysis.
time_series_multivariate TimeSeriesMultivariate Test plan to perform time series multivariate analysis.
time_series_forecast TimeSeriesForecast Test plan to perform time series forecast tests.
regression_model_performance RegressionModelPerformance Test plan for statsmodels regressor models that includes both metrics and validation tests

Data Quality

Run Data Quality Test Plan

Use the ValidMind (vm) library to perform data quality tests on a time series dataset. The process begins by describing a test plan specifically designed for time series data quality. This test plan contains a set of tests that evaluate the quality of the provided time series data.

Next, the raw DataFrame is used to initialize a dataset using the vm library. This newly created dataset object, vm_dataset, is then utilized for further processing. The test plan parameters are configured to define the z-score threshold for outlier detection and the minimum threshold for identifying missing values.

Finally, the test plan, time_series_data_quality, is executed using the vm.run_test_plan() function with the initialized dataset and the configuration settings provided. This function applies the specified tests to the dataset and generates a report on the quality of the time series data based on the configured parameters.

vm.test_plans.describe_plan("time_series_data_quality")
Attribute Value
ID time_series_data_quality
Name TimeSeriesDataQuality
Description Test plan for data quality on time series datasets
Required Context['dataset']
Tests TimeSeriesOutliers (ThresholdTest), TimeSeriesMissingValues (ThresholdTest), TimeSeriesFrequency (ThresholdTest)
Test Plans []
vm_dataset = vm.init_dataset(
    dataset=raw_df
)

config={
    "time_series_outliers": {
        "zscore_threshold": 3,

    },
    "time_series_missing_values":{
        "min_threshold": 2,
    }
}

vm.run_test_plan("time_series_data_quality", dataset=vm_dataset, config=config)
Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...
Running ThresholdTest: time_series_outliers:   0%|          | 0/3 [00:00<?, ?it/s]  
   Variable   z-score  Threshold       Date
0  FEDFUNDS  3.707038          3 1981-05-01
                                                                                                                                       

Results for Time Series Data Quality Test Plan:


Test plan for data quality on time series datasets

Logged the following test result to the ValidMind platform:

Time Series Outliers
Test Name
time_series_outliers
Category
data_quality
Passed
False
Params
{'zscore_threshold': 3}
Metric Plots

Logged the following test result to the ValidMind platform:

Time Series Missing Values
Test Name
time_series_missing_values
Category
data_quality
Passed
False
Params
{'min_threshold': 2}
Metric Plots

Logged the following test result to the ValidMind platform:

Time Series Frequency
Test Name
time_series_frequency
Category
data_quality
Passed
False
Params
{}
Metric Plots
TimeSeriesDataQuality(test_context=TestContext(dataset=Dataset(raw_dataset=            MORTGAGE30US  FEDFUNDS  GS10  UNRATE
DATE                                            
1947-01-01           NaN       NaN   NaN     NaN
1947-02-01           NaN       NaN   NaN     NaN
1947-03-01           NaN       NaN   NaN     NaN
1947-04-01           NaN       NaN   NaN     NaN
1947-05-01           NaN       NaN   NaN     NaN
...                  ...       ...   ...     ...
2023-04-01           NaN       NaN  3.46     NaN
2023-04-06          6.28       NaN   NaN     NaN
2023-04-13          6.27       NaN   NaN     NaN
2023-04-20          6.39       NaN   NaN     NaN
2023-04-27          6.43       NaN   NaN     NaN

[3551 rows x 4 columns], fields=[{'id': 'MORTGAGE30US', 'type': 'Numeric'}, {'id': 'FEDFUNDS', 'type': 'Numeric'}, {'id': 'GS10', 'type': 'Numeric'}, {'id': 'UNRATE', 'type': 'Numeric'}], sample=[{'id': 'head', 'data': [{'MORTGAGE30US': nan, 'FEDFUNDS': nan, 'GS10': nan, 'UNRATE': nan}, {'MORTGAGE30US': nan, 'FEDFUNDS': nan, 'GS10': nan, 'UNRATE': nan}, {'MORTGAGE30US': nan, 'FEDFUNDS': nan, 'GS10': nan, 'UNRATE': nan}, {'MORTGAGE30US': nan, 'FEDFUNDS': nan, 'GS10': nan, 'UNRATE': nan}, {'MORTGAGE30US': nan, 'FEDFUNDS': nan, 'GS10': nan, 'UNRATE': nan}]}, {'id': 'tail', 'data': [{'MORTGAGE30US': nan, 'FEDFUNDS': nan, 'GS10': 3.46, 'UNRATE': nan}, {'MORTGAGE30US': 6.28, 'FEDFUNDS': nan, 'GS10': nan, 'UNRATE': nan}, {'MORTGAGE30US': 6.27, 'FEDFUNDS': nan, 'GS10': nan, 'UNRATE': nan}, {'MORTGAGE30US': 6.39, 'FEDFUNDS': nan, 'GS10': nan, 'UNRATE': nan}, {'MORTGAGE30US': 6.43, 'FEDFUNDS': nan, 'GS10': nan, 'UNRATE': nan}]}], shape={'rows': 3551, 'columns': 4}, correlation_matrix=None, correlations=None, type='training', options=None, statistics=None, targets=None, target_column=None, class_labels=None, _Dataset__feature_lookup={}, _Dataset__transformed_df=None), model=None, models=_CountingAttr(counter=41, _default=NOTHING, repr=True, eq=True, order=True, hash=None, init=True, on_setattr=None, alias=None, metadata={}), train_ds=None, test_ds=None, validation_ds=None, y_train_predict=None, y_test_predict=None, context_data=None), config={...})

Handling Frequencies.

def identify_frequencies(df):
    """
    Identify the frequency of each series in the DataFrame.

    :param df: Time-series DataFrame
    :return: DataFrame with two columns: 'Variable' and 'Frequency'
    """
    frequencies = []
    for column in df.columns:
        series = df[column].dropna()
        if not series.empty:
            freq = pd.infer_freq(series.index)
            if freq == 'MS' or freq == 'M':
                label = 'Monthly'
            elif freq == 'Q':
                label = 'Quarterly'
            elif freq == 'A':
                label = 'Yearly'
            else:
                label = freq
        else:
            label = None

        frequencies.append({'Variable': column, 'Frequency': label})

    freq_df = pd.DataFrame(frequencies)

    return freq_df
frequencies = identify_frequencies(raw_df)
display(frequencies)
Variable Frequency
0 MORTGAGE30US None
1 FEDFUNDS Monthly
2 GS10 Monthly
3 UNRATE Monthly

Resample.

preprocessed_df = raw_df.resample('MS').last()
frequencies = identify_frequencies(preprocessed_df)
display(frequencies)
Variable Frequency
0 MORTGAGE30US Monthly
1 FEDFUNDS Monthly
2 GS10 Monthly
3 UNRATE Monthly

Run Data Quality Test Plan.

vm_dataset = vm.init_dataset(
    dataset=preprocessed_df
)
vm.run_test_plan("time_series_data_quality", dataset=vm_dataset, config=config)
Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...
Running ThresholdTest: time_series_outliers:   0%|          | 0/3 [00:00<?, ?it/s]  
        Variable   z-score  Threshold       Date
0       FEDFUNDS  3.106442          3 1980-03-01
1       FEDFUNDS  3.212296          3 1980-04-01
2       FEDFUNDS  3.537417          3 1980-12-01
3       FEDFUNDS  3.582783          3 1981-01-01
4       FEDFUNDS  3.441645          3 1981-05-01
5       FEDFUNDS  3.587823          3 1981-06-01
6       FEDFUNDS  3.572701          3 1981-07-01
7       FEDFUNDS  3.265222          3 1981-08-01
8   MORTGAGE30US  3.246766          3 1981-09-01
9   MORTGAGE30US  3.271251          3 1981-10-01
10  MORTGAGE30US  3.011098          3 1982-01-01
11        UNRATE  5.011303          3 2020-04-01
12        UNRATE  4.128421          3 2020-05-01
                                                                                                                                       

Results for Time Series Data Quality Test Plan:


Test plan for data quality on time series datasets

Logged the following test result to the ValidMind platform:

Time Series Outliers
Test Name
time_series_outliers
Category
data_quality
Passed
False
Params
{'zscore_threshold': 3}
Metric Plots

Logged the following test result to the ValidMind platform:

Time Series Missing Values
Test Name
time_series_missing_values
Category
data_quality
Passed
False
Params
{'min_threshold': 2}
Metric Plots

Logged the following test result to the ValidMind platform:

Time Series Frequency
Test Name
time_series_frequency
Category
data_quality
Passed
True
Params
{}
Metric Plots
TimeSeriesDataQuality(test_context=TestContext(dataset=Dataset(raw_dataset=            MORTGAGE30US  FEDFUNDS  GS10  UNRATE
DATE                                            
1947-01-01           NaN       NaN   NaN     NaN
1947-02-01           NaN       NaN   NaN     NaN
1947-03-01           NaN       NaN   NaN     NaN
1947-04-01           NaN       NaN   NaN     NaN
1947-05-01           NaN       NaN   NaN     NaN
...                  ...       ...   ...     ...
2022-12-01          6.42      4.10  3.62     3.5
2023-01-01          6.13      4.33  3.53     3.4
2023-02-01          6.50      4.57  3.75     3.6
2023-03-01          6.32      4.65  3.66     3.5
2023-04-01          6.43       NaN  3.46     NaN

[916 rows x 4 columns], fields=[{'id': 'MORTGAGE30US', 'type': 'Numeric'}, {'id': 'FEDFUNDS', 'type': 'Numeric'}, {'id': 'GS10', 'type': 'Numeric'}, {'id': 'UNRATE', 'type': 'Numeric'}], sample=[{'id': 'head', 'data': [{'MORTGAGE30US': nan, 'FEDFUNDS': nan, 'GS10': nan, 'UNRATE': nan}, {'MORTGAGE30US': nan, 'FEDFUNDS': nan, 'GS10': nan, 'UNRATE': nan}, {'MORTGAGE30US': nan, 'FEDFUNDS': nan, 'GS10': nan, 'UNRATE': nan}, {'MORTGAGE30US': nan, 'FEDFUNDS': nan, 'GS10': nan, 'UNRATE': nan}, {'MORTGAGE30US': nan, 'FEDFUNDS': nan, 'GS10': nan, 'UNRATE': nan}]}, {'id': 'tail', 'data': [{'MORTGAGE30US': 6.42, 'FEDFUNDS': 4.1, 'GS10': 3.62, 'UNRATE': 3.5}, {'MORTGAGE30US': 6.13, 'FEDFUNDS': 4.33, 'GS10': 3.53, 'UNRATE': 3.4}, {'MORTGAGE30US': 6.5, 'FEDFUNDS': 4.57, 'GS10': 3.75, 'UNRATE': 3.6}, {'MORTGAGE30US': 6.32, 'FEDFUNDS': 4.65, 'GS10': 3.66, 'UNRATE': 3.5}, {'MORTGAGE30US': 6.43, 'FEDFUNDS': nan, 'GS10': 3.46, 'UNRATE': nan}]}], shape={'rows': 916, 'columns': 4}, correlation_matrix=None, correlations=None, type='training', options=None, statistics=None, targets=None, target_column=None, class_labels=None, _Dataset__feature_lookup={}, _Dataset__transformed_df=None), model=None, models=_CountingAttr(counter=41, _default=NOTHING, repr=True, eq=True, order=True, hash=None, init=True, on_setattr=None, alias=None, metadata={}), train_ds=None, test_ds=None, validation_ds=None, y_train_predict=None, y_test_predict=None, context_data=None), config={...})

Remove missing values.

preprocessed_df = preprocessed_df.dropna()

Run Data Quality Test Plan.

vm_dataset = vm.init_dataset(
    dataset=preprocessed_df,
    target_column=target_column
)
vm.run_test_plan("time_series_data_quality", dataset=vm_dataset, config=config)
Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...
Running ThresholdTest: time_series_outliers:   0%|          | 0/3 [00:00<?, ?it/s]  
        Variable   z-score  Threshold       Date
0       FEDFUNDS  3.106442          3 1980-03-01
1       FEDFUNDS  3.212296          3 1980-04-01
2       FEDFUNDS  3.537417          3 1980-12-01
3       FEDFUNDS  3.582783          3 1981-01-01
4       FEDFUNDS  3.441645          3 1981-05-01
5       FEDFUNDS  3.587823          3 1981-06-01
6       FEDFUNDS  3.572701          3 1981-07-01
7       FEDFUNDS  3.265222          3 1981-08-01
8   MORTGAGE30US  3.246766          3 1981-09-01
9   MORTGAGE30US  3.271251          3 1981-10-01
10  MORTGAGE30US  3.011098          3 1982-01-01
11        UNRATE  5.011303          3 2020-04-01
12        UNRATE  4.128421          3 2020-05-01
                                                                                                                                       

Results for Time Series Data Quality Test Plan:


Test plan for data quality on time series datasets

Logged the following test result to the ValidMind platform:

Time Series Outliers
Test Name
time_series_outliers
Category
data_quality
Passed
False
Params
{'zscore_threshold': 3}
Metric Plots

Logged the following test result to the ValidMind platform:

Time Series Missing Values
Test Name
time_series_missing_values
Category
data_quality
Passed
True
Params
{'min_threshold': 2}

Logged the following test result to the ValidMind platform:

Time Series Frequency
Test Name
time_series_frequency
Category
data_quality
Passed
True
Params
{}
Metric Plots
TimeSeriesDataQuality(test_context=TestContext(dataset=Dataset(raw_dataset=            MORTGAGE30US  FEDFUNDS  GS10  UNRATE
DATE                                            
1971-04-01          7.29      4.16  5.83     5.9
1971-05-01          7.46      4.63  6.39     5.9
1971-06-01          7.54      4.91  6.52     5.9
1971-07-01          7.69      5.31  6.73     6.0
1971-08-01          7.69      5.57  6.58     6.1
...                  ...       ...   ...     ...
2022-11-01          6.58      3.78  3.89     3.6
2022-12-01          6.42      4.10  3.62     3.5
2023-01-01          6.13      4.33  3.53     3.4
2023-02-01          6.50      4.57  3.75     3.6
2023-03-01          6.32      4.65  3.66     3.5

[624 rows x 4 columns], fields=[{'id': 'MORTGAGE30US', 'type': 'Numeric'}, {'id': 'FEDFUNDS', 'type': 'Numeric'}, {'id': 'GS10', 'type': 'Numeric'}, {'id': 'UNRATE', 'type': 'Numeric'}], sample=[{'id': 'head', 'data': [{'MORTGAGE30US': 7.29, 'FEDFUNDS': 4.16, 'GS10': 5.83, 'UNRATE': 5.9}, {'MORTGAGE30US': 7.46, 'FEDFUNDS': 4.63, 'GS10': 6.39, 'UNRATE': 5.9}, {'MORTGAGE30US': 7.54, 'FEDFUNDS': 4.91, 'GS10': 6.52, 'UNRATE': 5.9}, {'MORTGAGE30US': 7.69, 'FEDFUNDS': 5.31, 'GS10': 6.73, 'UNRATE': 6.0}, {'MORTGAGE30US': 7.69, 'FEDFUNDS': 5.57, 'GS10': 6.58, 'UNRATE': 6.1}]}, {'id': 'tail', 'data': [{'MORTGAGE30US': 6.58, 'FEDFUNDS': 3.78, 'GS10': 3.89, 'UNRATE': 3.6}, {'MORTGAGE30US': 6.42, 'FEDFUNDS': 4.1, 'GS10': 3.62, 'UNRATE': 3.5}, {'MORTGAGE30US': 6.13, 'FEDFUNDS': 4.33, 'GS10': 3.53, 'UNRATE': 3.4}, {'MORTGAGE30US': 6.5, 'FEDFUNDS': 4.57, 'GS10': 3.75, 'UNRATE': 3.6}, {'MORTGAGE30US': 6.32, 'FEDFUNDS': 4.65, 'GS10': 3.66, 'UNRATE': 3.5}]}], shape={'rows': 624, 'columns': 4}, correlation_matrix=None, correlations=None, type='training', options=None, statistics=None, targets=None, target_column=['MORTGAGE30US'], class_labels=None, _Dataset__feature_lookup={}, _Dataset__transformed_df=None), model=None, models=_CountingAttr(counter=41, _default=NOTHING, repr=True, eq=True, order=True, hash=None, init=True, on_setattr=None, alias=None, metadata={}), train_ds=None, test_ds=None, validation_ds=None, y_train_predict=None, y_test_predict=None, context_data=None), config={...})

Exploratory Data Analysis

Univariate Analysis

Run Time Series Univariate Test Plan

vm.test_plans.describe_plan("time_series_univariate")
Attribute Value
ID time_series_univariate
Name TimeSeriesUnivariate
Description Test plan to perform time series univariate analysis.
Required Context['dataset']
Tests TimeSeriesLinePlot (Metric), TimeSeriesHistogram (Metric), ACFandPACFPlot (Metric), SeasonalDecompose (Metric), AutoSeasonality (Metric), AutoStationarity (Metric), RollingStatsPlot (Metric), AutoAR (Metric), AutoMA (Metric)
Test Plans []
test_plan_config = {
    "time_series_line_plot": {
        "columns": target_column + feature_columns
    },
    "time_series_histogram": {
        "columns": target_column + feature_columns
    },
    "acf_pacf_plot": {
        "columns": target_column + feature_columns
    },
    "auto_ar": {
        "max_ar_order": 3
    },
    "auto_ma": {
        "max_ma_order": 3
    },
    "seasonal_decompose": {
        "seasonal_model": 'additive',
         "fig_size": (40,30)
    },
    "auto_seasonality": {
        "min_period": 1,
        "max_period": 3
    },
      "auto_stationarity": {
        "max_order": 3,
        "threshold": 0.05
    },
      "rolling_stats_plot": {
        "window_size": 12    
    },
}

vm_dataset = vm.init_dataset(
    dataset=preprocessed_df
)
vm.run_test_plan("time_series_univariate", config=test_plan_config, dataset=vm_dataset)
Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...
Running Metric: acf_pacf_plot:  22%|██▏       | 2/9 [00:00<00:01,  4.27it/s]        The default method 'yw' can produce PACF values outside of the [-1,1] interval. After 0.13, the default will change tounadjusted Yule-Walker ('ywm'). You can use this method now by setting method='ywm'.
Running Metric: seasonal_decompose:  33%|███▎      | 3/9 [00:01<00:02,  2.24it/s]The default method 'yw' can produce PACF values outside of the [-1,1] interval. After 0.13, the default will change tounadjusted Yule-Walker ('ywm'). You can use this method now by setting method='ywm'.
The default method 'yw' can produce PACF values outside of the [-1,1] interval. After 0.13, the default will change tounadjusted Yule-Walker ('ywm'). You can use this method now by setting method='ywm'.
The default method 'yw' can produce PACF values outside of the [-1,1] interval. After 0.13, the default will change tounadjusted Yule-Walker ('ywm'). You can use this method now by setting method='ywm'.
The default method 'yw' can produce PACF values outside of the [-1,1] interval. After 0.13, the default will change tounadjusted Yule-Walker ('ywm'). You can use this method now by setting method='ywm'.
Running Metric: auto_ma:  89%|████████▉ | 8/9 [00:04<00:00,  1.71it/s]           Non-invertible starting MA parameters found. Using zeros as starting parameters.
Warning: MORTGAGE30US is not stationary. Results may be inaccurate.
Warning: FEDFUNDS is not stationary. Results may be inaccurate.
Warning: GS10 is not stationary. Results may be inaccurate.
Warning: MORTGAGE30US is not stationary. Results may be inaccurate.
Warning: FEDFUNDS is not stationary. Results may be inaccurate.
Non-invertible starting MA parameters found. Using zeros as starting parameters.
Warning: GS10 is not stationary. Results may be inaccurate.
Non-invertible starting MA parameters found. Using zeros as starting parameters.
Non-invertible starting MA parameters found. Using zeros as starting parameters.
                                                                                                                                    

Results for Time Series Univariate Test Plan:


This test plan provides a preliminary understanding of the target variable(s) used in the time series dataset. It visualizations that present the raw time series data and a histogram of the target variable(s). The raw time series data provides a visual inspection of the target variable's behavior over time. This helps to identify any patterns or trends in the data, as well as any potential outliers or anomalies. The histogram of the target variable displays the distribution of values, providing insight into the range and frequency of values observed in the data.

Logged the following plots to the ValidMind platform:

Metric Plots

Logged the following plots to the ValidMind platform:

Metric Plots

Logged the following plots to the ValidMind platform:

Metric Plots

Logged the following dataset metric to the ValidMind platform:

Metric Name
seasonal_decompose
Metric Type
dataset
Metric Scope
Metric Value
{'MORTGAGE30US': [{'Date': '1971-04-01', 'MORTGAGE30US': 7.29, 'trend': nan, 'seasonal': 0.06226307189542485, 'resid': nan}, {'Date': '1971-05-01', 'MORTGAGE30US': 7.46, 'trend': nan, 'seasonal': 0.04249183006535937, 'resid': nan}, {'Date': '1971-06-01', 'MORTGAGE30US': 7.54, 'trend': nan, 'seasonal': 0.038235294117647194, 'resid': nan}, {'Date': '1971-07-01', 'MORTGAGE30US': 7.69, 'trend': nan, 'seasonal': 0.03680555555555551, 'resid': nan}, {'Date': '1971-08-01', 'MORTGAGE30US': 7.69, 'trend': nan, 'seasonal': 0.03517156862745089, 'resid': nan}, {'Date': '1971-09-01', 'MORTGAGE30US': 7.67, 'trend': nan, 'seasonal': 0.04834150326797367, 'resid': nan}, {'Date': '1971-10-01', 'MORTGAGE30US': 7.63, 'trend': 7.493333333333333, 'seasonal': 0.02809640522875831, 'resid': 0.1085702614379084}, {'Date': '1971-11-01', 'MORTGAGE30US': 7.51, 'trend': 7.492500000000001, 'seasonal': -0.03171568627450976, 'resid': 0.049215686274508945}, {'Date': '1971-12-01', 'MORTGAGE30US': 7.48, 'trend': 7.483333333333333, 'seasonal': -0....
Metric Plots

Logged the following dataset metric to the ValidMind platform:

Metric Name
auto_seasonality
Metric Type
dataset
Metric Scope
Metric Value
[{'Variable': 'MORTGAGE30US', 'Seasonal Periods': [1, 2, 3], 'Residual Errors': [0.0, 0.05982966470569989, 0.07957527856739723], 'Best Period': 1, 'Decision': 'Not Seasonality'}, {'Variable': 'FEDFUNDS', 'Seasonal Periods': [1, 2, 3], 'Residual Errors': [0.0, 0.06235874318917296, 0.08612030906615928], 'Best Period': 1, 'Decision': 'Not Seasonality'}, {'Variable': 'GS10', 'Seasonal Periods': [1, 2, 3], 'Residual Errors': [0.0, 0.0618689710610932, 0.08207013284725433], 'Best Period': 1, 'Decision': 'Not Seasonality'}, {'Variable': 'UNRATE', 'Seasonal Periods': [1, 2, 3], 'Residual Errors': [0.0, 0.05563036982661475, 0.07511218969348427], 'Best Period': 1, 'Decision': 'Not Seasonality'}]

Logged the following dataset metric to the ValidMind platform:

Metric Name
auto_stationarity
Metric Type
dataset
Metric Scope
Metric Value
[{'Variable': 'MORTGAGE30US', 'Integration Order': 0, 'Test': 'ADF', 'p-value': 0.6719476319623869, 'Threshold': 0.05, 'Pass/Fail': 'Fail', 'Decision': 'Non-stationary'}, {'Variable': 'MORTGAGE30US', 'Integration Order': 1, 'Test': 'ADF', 'p-value': 0.6719476319623869, 'Threshold': 0.05, 'Pass/Fail': 'Fail', 'Decision': 'Non-stationary'}, {'Variable': 'MORTGAGE30US', 'Integration Order': 2, 'Test': 'ADF', 'p-value': 2.1564529205869017e-30, 'Threshold': 0.05, 'Pass/Fail': 'Pass', 'Decision': 'Stationary'}, {'Variable': 'FEDFUNDS', 'Integration Order': 0, 'Test': 'ADF', 'p-value': 0.10580096854509285, 'Threshold': 0.05, 'Pass/Fail': 'Fail', 'Decision': 'Non-stationary'}, {'Variable': 'FEDFUNDS', 'Integration Order': 1, 'Test': 'ADF', 'p-value': 0.10580096854509285, 'Threshold': 0.05, 'Pass/Fail': 'Fail', 'Decision': 'Non-stationary'}, {'Variable': 'FEDFUNDS', 'Integration Order': 2, 'Test': 'ADF', 'p-value': 6.63287423848887e-05, 'Threshold': 0.05, 'Pass/Fail': 'Pass', 'Decision': 'Stationary'}, {'Variable': 'G...

Logged the following plots to the ValidMind platform:

Metric Plots

Logged the following dataset metric to the ValidMind platform:

Metric Name
auto_ar
Metric Type
dataset
Metric Scope
Metric Value
[{'Variable': 'MORTGAGE30US', 'AR orders': [0, 1, 2, 3], 'BIC': [3261.2961769915023, 305.21183658889845, 264.1170330508455, 246.00987437282683], 'AIC': [3252.4238762547634, 291.90819703253607, 246.3852726798888, 223.85321896315943]}, {'Variable': 'FEDFUNDS', 'AR orders': [0, 1, 2, 3], 'BIC': [3503.7008305071085, 1004.9849297230782, 892.2667954401288, 877.9590390101184], 'AIC': [3494.8285297703696, 991.6812901667158, 874.5350350691721, 855.8023836004511]}, {'Variable': 'GS10', 'AR orders': [0, 1, 2, 3], 'BIC': [3227.7817471043095, 257.84600891983814, 197.38862545941524, 178.69646416841636], 'AIC': [3218.9094463675706, 244.54236936347576, 179.65686508845852, 156.53980875874896]}, {'Variable': 'UNRATE', 'AR orders': [0, 1, 2, 3], 'BIC': [2445.1823734160316, 839.4708598770724, 843.6216434176804, 847.4766293794899], 'AIC': [2436.3100726792927, 826.16722032071, 825.8898830467236, 825.3199739698224]}]

Logged the following dataset metric to the ValidMind platform:

Metric Name
auto_ma
Metric Type
dataset
Metric Scope
Metric Value
[{'Variable': 'MORTGAGE30US', 'MA orders': [0, 1, 2, 3], 'BIC': [3261.2961770048146, 2457.615517242118, 1830.627108731936, 1396.2242205656648], 'AIC': [3252.4238762680757, 2444.3070661370098, 1812.8825072584582, 1374.0434687238178]}, {'Variable': 'FEDFUNDS', 'MA orders': [0, 1, 2, 3], 'BIC': [3503.7008305091904, 2711.0415312005616, 2092.3283152616114, 1772.9679219992797], 'AIC': [3494.8285297724515, 2697.7330800954533, 2074.5837137881335, 1750.7871701574327]}, {'Variable': 'GS10', 'MA orders': [0, 1, 2, 3], 'BIC': [3227.781747186831, 2417.607411274255, 1773.7515293082215, 1356.7335414860474], 'AIC': [3218.909446450092, 2404.2989601691465, 1756.0069278347437, 1334.5527896442004]}, {'Variable': 'UNRATE', 'MA orders': [0, 1, 2, 3], 'BIC': [2445.1823734251507, 1785.6606908358158, 1465.0916869814555, 1247.22474088198], 'AIC': [2436.3100726884118, 1772.3522397307074, 1447.3470855079777, 1225.043989040133]}]
TimeSeriesUnivariate(test_context=TestContext(dataset=Dataset(raw_dataset=            MORTGAGE30US  FEDFUNDS  GS10  UNRATE
DATE                                            
1971-04-01          7.29      4.16  5.83     5.9
1971-05-01          7.46      4.63  6.39     5.9
1971-06-01          7.54      4.91  6.52     5.9
1971-07-01          7.69      5.31  6.73     6.0
1971-08-01          7.69      5.57  6.58     6.1
...                  ...       ...   ...     ...
2022-11-01          6.58      3.78  3.89     3.6
2022-12-01          6.42      4.10  3.62     3.5
2023-01-01          6.13      4.33  3.53     3.4
2023-02-01          6.50      4.57  3.75     3.6
2023-03-01          6.32      4.65  3.66     3.5

[624 rows x 4 columns], fields=[{'id': 'MORTGAGE30US', 'type': 'Numeric'}, {'id': 'FEDFUNDS', 'type': 'Numeric'}, {'id': 'GS10', 'type': 'Numeric'}, {'id': 'UNRATE', 'type': 'Numeric'}], sample=[{'id': 'head', 'data': [{'MORTGAGE30US': 7.29, 'FEDFUNDS': 4.16, 'GS10': 5.83, 'UNRATE': 5.9}, {'MORTGAGE30US': 7.46, 'FEDFUNDS': 4.63, 'GS10': 6.39, 'UNRATE': 5.9}, {'MORTGAGE30US': 7.54, 'FEDFUNDS': 4.91, 'GS10': 6.52, 'UNRATE': 5.9}, {'MORTGAGE30US': 7.69, 'FEDFUNDS': 5.31, 'GS10': 6.73, 'UNRATE': 6.0}, {'MORTGAGE30US': 7.69, 'FEDFUNDS': 5.57, 'GS10': 6.58, 'UNRATE': 6.1}]}, {'id': 'tail', 'data': [{'MORTGAGE30US': 6.58, 'FEDFUNDS': 3.78, 'GS10': 3.89, 'UNRATE': 3.6}, {'MORTGAGE30US': 6.42, 'FEDFUNDS': 4.1, 'GS10': 3.62, 'UNRATE': 3.5}, {'MORTGAGE30US': 6.13, 'FEDFUNDS': 4.33, 'GS10': 3.53, 'UNRATE': 3.4}, {'MORTGAGE30US': 6.5, 'FEDFUNDS': 4.57, 'GS10': 3.75, 'UNRATE': 3.6}, {'MORTGAGE30US': 6.32, 'FEDFUNDS': 4.65, 'GS10': 3.66, 'UNRATE': 3.5}]}], shape={'rows': 624, 'columns': 4}, correlation_matrix=None, correlations=None, type='training', options=None, statistics=None, targets=None, target_column=None, class_labels=None, _Dataset__feature_lookup={}, _Dataset__transformed_df=None), model=None, models=_CountingAttr(counter=41, _default=NOTHING, repr=True, eq=True, order=True, hash=None, init=True, on_setattr=None, alias=None, metadata={}), train_ds=None, test_ds=None, validation_ds=None, y_train_predict=None, y_test_predict=None, context_data={'seasonal_decompose': {'MORTGAGE30US': <statsmodels.tsa.seasonal.DecomposeResult object at 0x294099420>, 'FEDFUNDS': <statsmodels.tsa.seasonal.DecomposeResult object at 0x2946c33d0>, 'GS10': <statsmodels.tsa.seasonal.DecomposeResult object at 0x294bbe1a0>, 'UNRATE': <statsmodels.tsa.seasonal.DecomposeResult object at 0x294fb2b90>}}), config={...})

Multivariate Analysis

Run Time Series Multivariate Test Plan

vm.test_plans.describe_plan("time_series_multivariate")
Attribute Value
ID time_series_multivariate
Name TimeSeriesMultivariate
Description Test plan to perform time series multivariate analysis.
Required Context['dataset']
Tests ScatterPlot (Metric), LaggedCorrelationHeatmap (Metric), SpreadPlot (Metric)
Test Plans []
test_plan_config = {
    "scatter_plot": {
        "columns": target_column + feature_columns
    },
    "lagged_correlation_heatmap": {
        "target_col": target_column,
        "independent_vars": feature_columns
    },
    "engle_granger_coint": {
        "threshold": 0.05
    },
}

vm.run_test_plan("time_series_multivariate", config=test_plan_config, dataset=vm_dataset)
                                                                                                                                  

Results for Time Series Multivariate Test Plan:


This test plan provides a preliminary understanding of the features and relationship in multivariate dataset. It presents various multivariate visualizations that can help identify patterns, trends, and relationships between pairs of variables. The visualizations are designed to explore the relationships between multiple features simultaneously. They allow you to quickly identify any patterns or trends in the data, as well as any potential outliers or anomalies. The individual feature distribution can also be explored to provide insight into the range and frequency of values observed in the data. This multivariate analysis test plan aims to provide an overview of the data structure and guide further exploration and modeling.

Logged the following plot to the ValidMind platform:

Metric Plots

Logged the following plot to the ValidMind platform:

Metric Plots

Logged the following plots to the ValidMind platform:

Metric Plots
TimeSeriesMultivariate(test_context=TestContext(dataset=Dataset(raw_dataset=            MORTGAGE30US  FEDFUNDS  GS10  UNRATE
DATE                                            
1971-04-01          7.29      4.16  5.83     5.9
1971-05-01          7.46      4.63  6.39     5.9
1971-06-01          7.54      4.91  6.52     5.9
1971-07-01          7.69      5.31  6.73     6.0
1971-08-01          7.69      5.57  6.58     6.1
...                  ...       ...   ...     ...
2022-11-01          6.58      3.78  3.89     3.6
2022-12-01          6.42      4.10  3.62     3.5
2023-01-01          6.13      4.33  3.53     3.4
2023-02-01          6.50      4.57  3.75     3.6
2023-03-01          6.32      4.65  3.66     3.5

[624 rows x 4 columns], fields=[{'id': 'MORTGAGE30US', 'type': 'Numeric'}, {'id': 'FEDFUNDS', 'type': 'Numeric'}, {'id': 'GS10', 'type': 'Numeric'}, {'id': 'UNRATE', 'type': 'Numeric'}], sample=[{'id': 'head', 'data': [{'MORTGAGE30US': 7.29, 'FEDFUNDS': 4.16, 'GS10': 5.83, 'UNRATE': 5.9}, {'MORTGAGE30US': 7.46, 'FEDFUNDS': 4.63, 'GS10': 6.39, 'UNRATE': 5.9}, {'MORTGAGE30US': 7.54, 'FEDFUNDS': 4.91, 'GS10': 6.52, 'UNRATE': 5.9}, {'MORTGAGE30US': 7.69, 'FEDFUNDS': 5.31, 'GS10': 6.73, 'UNRATE': 6.0}, {'MORTGAGE30US': 7.69, 'FEDFUNDS': 5.57, 'GS10': 6.58, 'UNRATE': 6.1}]}, {'id': 'tail', 'data': [{'MORTGAGE30US': 6.58, 'FEDFUNDS': 3.78, 'GS10': 3.89, 'UNRATE': 3.6}, {'MORTGAGE30US': 6.42, 'FEDFUNDS': 4.1, 'GS10': 3.62, 'UNRATE': 3.5}, {'MORTGAGE30US': 6.13, 'FEDFUNDS': 4.33, 'GS10': 3.53, 'UNRATE': 3.4}, {'MORTGAGE30US': 6.5, 'FEDFUNDS': 4.57, 'GS10': 3.75, 'UNRATE': 3.6}, {'MORTGAGE30US': 6.32, 'FEDFUNDS': 4.65, 'GS10': 3.66, 'UNRATE': 3.5}]}], shape={'rows': 624, 'columns': 4}, correlation_matrix=None, correlations=None, type='training', options=None, statistics=None, targets=None, target_column=None, class_labels=None, _Dataset__feature_lookup={}, _Dataset__transformed_df=None), model=None, models=_CountingAttr(counter=41, _default=NOTHING, repr=True, eq=True, order=True, hash=None, init=True, on_setattr=None, alias=None, metadata={}), train_ds=None, test_ds=None, validation_ds=None, y_train_predict=None, y_test_predict=None, context_data=None), config={...})